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ABSTRACT 

Although the adoption of OGC Web Services for server, 
desktop and web apphcations has been successful, its pene- 
tration in mobile devices has been slow. One of the main rea- 
sons is the performance problems associated with XML pro- 
cessing as it consumes a lot of memory and processing time, 
which are scarce resources in a mobile device. In this paper 
we propose an algorithm to generate efficient code for XML 
data binding for mobile SOS-based applications. The algo- 
rithm take advantage of the fact that individual implemen- 
tations use only some portions of the standards' schemas, 
which allows the simplification of large XML schema sets 
in an application-specific manner by using a subset of XML 
instance files conforming to these schemas. 

Categories and Subject Descriptors 

L7.2 [Document and Text Processing]: Document Prepa- 
ration — Languages and System, Standards 

General Terms 

Performance, Design, Experimentation, Standardization, Lan- 
guages 

Keywords 

XML Schema, Web Services, Geospatial Information, XML 
Data Binding, Sensor Observation Services 

1. INTRODUCTION 

Interoperability is a key concept when building distributed 
applications, as it ensures that service providers and con- 
sumers can exchange information in a way that can be un- 
derstood. In the Geographic Information Systems (GIS) 
field, this interoperability is achieved by using standards 
or implementation specifications, such as those defined by 



the Open Geospatial Consortium (OGS), known as OGC 
Web Services (OWS). These standards allow clients to ac- 
cess geospatial data through a well-defined set of operations. 
The specifications define the structure of XML messages ex- 
changed between clients and servers using XML Schema 23 
[24]. 

One of these standards is the Sensor Observation Ser- 
vice (SOS) Implementation Specification |15|, which allows 
the publication and consumption of information gathered by 
sensors or system of sensors. This specification has gained 
a lot of popularity in recent years, apparently because of 
the explosion of the number of sensors and related devices 
producing a massive amount of data 18 . Several imple- 



mentations of this specification have been presented for the 
client and server side mainly targeted to servers, desktop 
and web applications. As the adoption of the standard for 
these applications have been successful, its integration in 
mobile devices has been slow. One of the main reasons is 
the performance problems associated with XML processing, 
as the effort to parse and serialize XML messages from files 
(or communication channels) to memory and vice versa, con- 
sumes a lot of memory and processing time, which are scarce 
resources in a mobile device. 

According to [25] XML processing can be implemented 
using a vocabulary-independent d ata access interface such as 
those provided by SAjQc 
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; or using a vocabulary- 



Jor DOM 

dependent data access interface, where XML data is mapped 
into application-specific concepts. The first option is recog- 
nized to be difficult and error-prone producing code that 
is hard to modify and maintain. The second option, also 
known as XML Data Binding, is favoured as " relieve devel- 
opers from the burden of mapping data from a vocabulary- 
independent DAI (Data Access Interface) to application- spe- 
cific data structures. Developers can focus on the semantics 
of the data they are manipulating, while leaving the type con- 
version to the vocabulary-specific DAI implementation" [25] . 
XML data binding code is often produced by using code gen- 
erators. Code generators provide an attractive approach, 
potentially giving benefits such as increased productivity, 
consistent quality throughout all the generated code, and 
the potential to support different programming languages, 
frameworks and platforms. 

Recent studies have proven that XML can be processed 
efficiently in resource-constrained devices if the appropri- 
ate methods and tools are used [i] [7]. There are also sev- 
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eral tools available for generating XML data binding code 
for mobile devices such as XBindeij^ and CodeSysnthesis 
eXSr^ or for building complete web services communica- 
tion end-points for resource constrained environments, such 
as gSOAP JIq]. Nevertheless, these solutions are not easily 
or effectively applied to mobile SOS-based applications. For 
example, the solutions presented in [sj |5j [2l] for efficient 
processing of XML in mobile devices use compression tech- 
niques for XML data. This requires that all of the existing 
infrastructure of data providers must be modified to offer 
data in these compressed formats. On the other hand, XML 
data binding code generators tend to map types in schema 
files to types in the target language in a straightforward way 
by creating a type in the target programming language for 
every type in the schema files, which cause that large schema 
files produces large binary compiled files. Last, several OWS 
specification do not provide support or provide a limited 
amount of support for SOAP. Requests sent in XML format 
over HTTP are widespread, although the current trend is to 
support SOAP in new specifications [16| . 

The main hurdle to generate efficient code for mobile ap- 
plications in SOS is the large size and complexity of the 
schemas associated to the specification. The large size of 
the schemas is justified because they must satisfy a large 
set of usage scenarios, although individual implementations 
frequently use only a small fraction of them. This fact offers 
a way of optimizing real implementations by using only the 
subset of the schemas really necessary for a given applica- 
tion. Based on this, we present in this paper, an algorithm 
to simplify large XML schema sets, such the one associated 
to SOS, in an application-specific manner by using a set of 
XML instance files conforming to these schemas. A real use 
case scenario, the implementation of a mobile SOS client for 
the Android platform, is presented to prove the effectiveness 
of the algorithm. 

The remainder of this paper is structured as follows. Next 
section presents an introduction to XML Schema, including 
necessary notation and concepts used in this paper. Sec- 
tion 3 presents the algorithm to simplify schema sets based 
in a subset of input instance files. Section 4 presents ex- 
perimental results using an use case scenario. In Section 5 
related work on the subject is presented. Lastly, we present 
conclusions and future work. 



to express subtyping relationships. This mechanism allows 
types to be defined as subtypes of existing types, either by 
extending the base types content model in the case of deriva- 
tion by extension {Child in Figure [T|; or by restricting it, 
in the case of derivation by restriction. What is interest- 
ing about type derivation is that wherever we find in the 
schemas an element of type A, the actual type of the ele- 
ment in an instance file can be either A or any type derived 
from A. This is why in the example an element of type Base 
can be substituted by an element of type Child. This poly- 
morphic situation creates non-explicit dependencies between 
types, which we call hidden dependencies. 

<complexType name="Base"> 

<sequence> 

<element name="baseElem" type="string"/> 
<element ref="baseElem2" minOccurs="0"/> 

< /sequence > 

< / complexType> 

<complexType name="Child"> 
< complexContent > 

<extension base="Base"> 
< sequence > 

<element name="chdElem" 
type="string"/ > 
< /sequence > 
</extension> 
< /complexContent> 

< / complexType> 

<complexType name="ContainerType" 
<sequence> 

<element name="item" type="Base" 
maxOcurrs="unbounded"/ > 
< /sequence > 

< / complexType> 

<element name="container" type="ContainerType" /> 

<element name="baseElem2" type="string" /> 

Figure 1: XML Schema file fragment. 



2. XML SCHEMA 

XML Schema is used to define the structure of informa- 
tion contained in XML instance files [23[ |24| . The struc- 
ture is defined using schema components such as complex 
types, simple types, elements, attributes, and element and 
attribute groups. An instance document conforming to this 
structure is said to be valid against the schema. We denote 
the set of all valid files against a schema S as I(S). Figure [l] 
shows a fragment of a schema file. The file contains the dec- 
laration of three global complex types and a global element. 
For the sake of simplicity we have omitted the schema root 
element and namespace declarations. 

In Figure [2] we can see two valid instance documents for 
this schema. In the second instance we can observe that the 
item element is of type Child instead of type Base. This 
is because XML Schema provides a derivation mechanism 



Apart from type derivation, a second subtyping mecha- 
nism is provided through substitution groups. This feature 
allows global elements to be substituted by other elements 
in instance files. A global element E, referred to as head el- 
ement, can be substituted by any other global element that 
is defined to belong to the E's substitution group. 

In the following subsection we introduce the notation used 
in the remainder of the report to refer to nodes and com- 
ponents included into XML instance documents and schema 
files, respectively. We also define concepts, relations and 
operations necessary to expose our algorithm. After this we 
present a brief description of SOS schemas. 

2.1 Notation 

To refer to nodes contained in instance files, we will use 
XPath notation 
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XPath expressions are shown in bold. 
The following examples referring to nodes in Figure 2 should 
suffice to understand the notation through the remainder of 
this paper: 



Instance 1: 

<Container> 
<item> 

<baseElem> String Value l</baseElem> 
</item> 
<item> 

<baseElem> String Value 2</baseElem> 
</item> 
</Container> 

Instance 2: 

<Container> 

<item xsi:type="Cliild"> 

<baseElem> Base String Value</baseElem> 
<clidElem> Child String Value</clidElem> 
</item> 
</Container> 

Figure 2: Valid XML fragments for schema fragment 
in Figure 1. 

• / Container refers to the root node of instance files. 

• /Container /item refers to all items contained in the 
root elements. 

• / Container /item [i] refers to item in position i inside 
/Container. Positions are counted starting at 1. 

To refer to components in schema files, we will use to fol- 
lowing notation: 

• To refer to global types and elements, we use its name 
in italics, e.g. Container, ContainerType, etc. 

• To refer to attributes or elements within types, model 
groups or attribute groups, we add their name and a 
colon as prefix to the attribute or element name. The 
whole expression is written in italics. For example: 
ContainerType:item, Base:baseElem, Child: chdElem, etc 

For the purpose of our discussion we define the concept of 
schema set in the following way: 

Definition 1: An schema set S — (Ts, Es, As, MGs, 
AGs, Rs), where Ts is the set of all type definitions, Es 
is the set of all element declarations. As is the set of all 
attribute declarations, MGs is the set of all element group 
definitions, AGs is the set of all attribute group definitions, 
and Rs is a set of binary relations (described later) between 
components ofTs, Es, As, MGs, and AGs- 
Components included in sets Ts, MGs and AGs, are com- 
posed by a set of inner components. In the case of types, 
inner components can refer to global elements, attributes, 
model groups and attribute groups, or they can be nested 
element and attribute declarations. Model groups may con- 
tain references to global elements and other model groups, 
or they may contain nested element declarations. Similarly, 
attribute groups may contain references to other global at- 
tributes and attribute groups, or they may contain nested 
attribute declarations. Inner components can be optional, 
meaning that is legal that they do not appear in all valid in- 
stance documents. For example, element baseElem2 in Base 



is optional; as such, items in Figure [2] are valid even if they 
do not contain this element. 

The binary relations contained in Rs are: 

• isOfType(x, t): relates an element or attribute 
X to its corresponding type t. For exam- 
ple: isOfType{Container, ContainerType), isOfT- 
ype{Base:baseElem, string). 

• reference(x, y): relates x £Ts UMGg UAGs to y & 
EsUAsUMGsUAGs if i references y in its definition 
using the ref attribute in any of its components, e.g. 
reference{Base, baseElem2). 

• contains (x, y): relates x aTsVJ MGs U AGs to y £ 
Es U As if X defines y as an inner attribute or element 
in its declaration, e.g. contains(Base,Base;baseElem), 
contains ( Child, child: chdElem) , contains ( Container, 
Container litem) . 

• isDerivedFrom(t, b): relates a type t to its base type 
b, e.g. isDerivedFrom{Child, Base) 

• isInSubstitutionGroup(x, y): relates an element x 
to another element y if y is the head element of the x's 
substitution group. 

The schema set S for the schema fragment in Figure [T] 
remains as follow^ 

S = { Ts — { Base, Child, string, ContainerType} 

Es ~ { Container, baseElem2, Base:baseElem, 
Child:chdElem, ContainerType:item} 

As =0 
MGs - 

Rs = { isOfType = { (Container, ContainerType), 
(baseElem2, string), 
(Base:baseElem, string), 
(ChildichdElem, string), 
(ContainerType:item, Base) }, 
isDerivedFrom = {(Child, Base)}, 
reference = {(Base, baseElem2)} , 
contains = {(Base, Base:baseElem) , 
(Child, ChildichdElem), 
( ContainerType, 

ContamerTypeiitem)} 
isInSubstitutionCroup = 0} 

Figure |3] shows a graph with all of these relations. This 
graph does not reflect hidden dependencies between types 
or elements. To include them in the graph we had to add 
an extra edge between ContainerTypeiitem and Child as the 
former may be of type Child in instance files. 

Next, we define the subset relation for schemas: 

Definition 2i Let S = (Ts, Es, As, MGs, AGs, Rs) 
and Si=(Tsi, Esi, Asi, MGsi, AGsi, Rsi), be two schema 
sets, we said that Si is a subset of S if Tsi C Ts, Esi C 
Es,Asi C As,MGsi C MGs,AGsi C AGs, and for every 
relation Ris in Rs , RiSi RiS, for example, isTypeOfj^si 

*XML Schema any Type has been omitted purposely to sim- 
plify exposition. 




Figure 3: Graph of relations in schema fragment in 
Figure 1 



C isTypeOfj^g 

According to this definition a subset of tlie schema set in 
t igure [l] could be: 

S = { Ts = Base, string, Container Type 
Es ~ {Container, Base:baseElem, 
ContainerTypeiitem} 

As =0 
MGs= 
AGs-0 

Rs ~ { isOfType = { (Container, ContainerType), 
(BaseibaseElem, string), 
(ContainerTypeiitem, Base) }, 

isDerivedFrom = 0, 

reference = 0, 

contains = {(Base, BaseibaseElem), 
( ContainerType, 

ContainerTypeiitem)} 
isInSubstitutionCroup = 0} 

Last, we define the union of two schema sets. This oper- 
ation will be used in the following sections. 

Definition 3i Let Si = (Tsi, Esi, Asi, MGsi, AG si, 
Rsi) and S2=(Ts2, Es2, As2, MGs2, AGs2, Rs2), be two 
schema sets, we said that S = (Ts, Es, As, MGs, AGs, 
Rs) is the union of Si and S2 if Ts — Tsi U Ts2, Es = 
EsiUEs2, MGs = MGsiUMGs2, As ^ AsiUAs2, AGs 
= AGsi U AGs2 and Viiis G -Rs, RiS = R^si U RiS2; e.g. 
isTypeOfRs = isTypeOfRsi U isTypeOfRS2 



2.2 SOS Schemas 

As mentioned before, SOS allows the publication and con- 
sumption of information gathered by sensors or sensors sys- 
tem. Schemas associated to this service specification are 
probably among the most complex geospatial web service 
schemas, as they are built on the foundation provided by 
other specifications such as Geography Markup Language 
(GML) [To], Sensor Model Language (SensorML) [Ti], and 
Observation and Measurements (O&M) '12 . GML is a lan- 
guage for expressing geographical features, which is used as 
a common language through all of the OGC specifications. 
SensorML is a language used to describe sensors and sensor 
systems. And, O&M is used as encoding for sensor observa- 
tions. 

Figure |4] shows the dependencies of schemas in SOS from 
schemas in other specifications. In addition to the specifica- 
tions mentioned before, SOS depends also on OWS Comm- 
mon [13], for common mechanisms in OWS; Filter Encoding 
Implementation Specification [Tl], to filter observations re- 
quested to the server; and SWE Commoij^ 

that contains 

shared common data types and data encodings for all of the 
specifications related to sensors. 



SOS 1.0.0 




Figure 4: Dependencies of SOS schemas from other 
specifications 



The whole SOS schema set contains more than 700 com- 
plex types and global elements, which make it large, accord- 
ing to the categorization for schema size based in the number 
of complex types presented in 6 . In this categorization a 
schema set with a number of complex types in the range 
256-1,000 is considered large. Other categories are mini, 
0-32 complex types; small, 32-100 complex types; medium, 
100 -256 complex types; and huge, more than 1,000 complex 
types. 

^The first version of SWE Commmon is embedded on the 
SensorML implementation specification document, although 
its schemas are physically separated on a different folder 



3. SIMPLIFYING SCHEMA FILES 

In practical terms our problem of simplifying the schema 
set related to SOS, denoted as Ssos, to the subset that is 
used in an actual implementation P, denoted as Sp could 
be formulated as follows: 

Problem: Calculate Sp starting from. Ssos and X, a set 
of instance files, knowing that X C I{Sp), trying to make 
Sp as small as possible. 

As the set of valid instance files for a schema is potentially 
infinite, the resulting schema set should validate correctly all 
of the files in X files, but might validate other instance files 
as well. 

The algorithm presented in this section is based on two 
main assumptions. The first one is that actual implemen- 
tations do not use all of the information contained on the 
schemas. Instead, they use the only the parts required to im- 
plement specific application requirements. Although this as- 
sumption may seem obvious to some extent, it is supported 
by the results presented in 17 , which showed that a set of 
53 SOS server instances available on the Internet used less 
than 30% of the SOS schemas. The second assumption is 
that a representative set of instance files is available before- 
hand to drive the simplification process. By representative 
we mean in this context that all of the information items on 
instance files that must be parsed by the application must 
be represented somehow in some of the input instance files. 
Unfortunately, if instances with new information must be 
added or existing instances are discarded the simplification 
algorithm must be executed again. 

3.1 Helper Functions 

The algorithm to calculate Sp uses the following helper 
operations in its definition: 

• typeOf(node): returns the type of an XML node in 
an instance file. For example in instance 1 in Figure 
2, typeOf( /Container ) = Container Type, typeOf( 
/Container/item[l]^ — Base. In instance 2 typeOf( 
/Container/item[l] j = Child. 

• element (node): returns the element definition match- 
ing the content of node. For example, ele- 
ment ( /Container /item[l]) = ContainerType:item 
in both instances in Figure 2. 

• containerOf(node): returns the component contain- 
ing the definition or reference to element (node). 
For example, containerOf( /Container/item[l]^ = 

containerOf(ContamerType:itern) = ContamerType. 

• ancestors (type): returns all of the ancestors of type. 

• leaf (node) : returns true is node is a leaf, i.e. node 
does not contain any child element and has a value. 
Examples of leaf nodes in instance 2 in Figure [2] 
are /Container /item[l]/baseElem and /Con- 
tainer / item [1] / chdElem. 

• root(instance file): returns the root node of instance 
file. 

• addValueToRelation(S, R(x,y)): adds i? fir, to the 
schema set S. R must be one of the relations defined 
in Section 2.1. 



• copyRelations(ST , Ss, C): Copy all relation pairs 
between schema components in C, from the source 
schema set Ss to the target schema set St- 

3.2 Algorithm 

The algorithm to calculate Sp, henceforth called subset- 
ting algorithm, is expressed as follows: 

Input: X = X \ X input instance file 
Input: schema set S 

Output: schema subset Sx needed to validate instances in X 
Sx = (D, 

For each x in X 
beginFor 

T — SchemaSubsetUsedIn(root(x) , S) 

Sx ~ union(Sx , T) 
endFor 
Result = Sx 

The key of this algorithm is the function SchemaSub- 
setUsedIn(node, schema set) that calculates the subset of 
the schemas used in an XML file fragment starting at a 
given node. The second parameter is the schema set defin- 
ing the fragment structure. The result of this function is 
calculated for the root element of all instance files and then 
joined through the union operation defined in the previous 
section. 

Next, we present the algorithm for SchemaSubse- 
tUsedln. For the sake of clarity in the exposition of the 
algorithm we do not consider attributes and substitution 
groups. The code considering these cases is similar to 
processing element and subtypes. 

SchemaSubsetUsedIn 

Input: instance file node x 
Input: schema set S 

Output: schema subset Sx needed to validate the 
nodes contained in x 
= 

Esx = Esx + element{x) 

Tsx = Tsx + typeOf(x) + ancestors(typeOf{x)) 
addValueToRelation(Sx .typeOf ( element(x) , 

typeOf( element (x) ))) 
copyRelations(Sx, S, ancestors (typeOf(x))) 
If not leaf(x) Then 
beginlf 

For each child node z of x 

begmFor 

Sx ~ union (Sx, SchemaSubsetUsedIn(z, S)) 

Container =containerOf(x) 

If z belongs to a model group M Then 

beginlf 

MGsx = MGsx + x; 

add Value ToRelation ( Sx , 

reference(containerOf(M), M)) 

Container = M 
endlf 

If z IS reference to global element 
add Value ToRelation(Sx , 

reference(Container , element(z))) 

Else 



add Value ToRelation ( Sx , 

contains (Container , element(z))) 

endFor 
endlf 

Result =Sx 

The algorithm starts by adding the element defini- 
tion matching the content of the node specified as in- 
put to the result. The type of the node is also added, 
as well as pair (element(x), typeOf(element(x))) to re- 
lation isTypeOf. It is very important to notice at 
this point that typeOf(x) and typeOf(element(x)) are 
not always the same because the dynamic type of x 
may be a subtype of the declared type for the element 
matching its structure. This is the case in instance 2 
listed in Figure [2j where t3/peO/('/Container/item[l]^ 
= Child, but typeOf(element( /Container /item[l])) — 
typeOf(ContainerType:item) — Base. All of the ancestors 
of the type and all of their relations are also added to the 
result to maintain consistency of the model. 

The next step is to analyse the child nodes in x, in 
case it has any. For each child node z, we call recursively 
the function SchemaSubsetUsedIn and the schema set 
returned by this function is combined with the current 
result using the union operation. After this, a set of relation 
values are added to maintain the consistency of the model. 
First, the container of x is calculated. This container is the 
type or model group that contains the element matching 
node X. It could be typeOf(x), but could also be any of its 
ancestors. It also could be any model group referenced by 
typeOf(x) or any of its ancestors. For example, let us calcu- 
late containerOf( /Container/item[l]/baseElemJ 
in the second example in Figure |2] Even when 
typeOf( /Container/itein[l] j is type Child, this 
type does not contain the definition of Con- 
tainer/item[l]/baseElem because it was inherited 
from Base. 

The relation between element(z) and its container 
must be added to the result. The pair {Con- 
tainerOf(element(z)),element(z)) is added to reference 
or contains depending if the element is referenced or it is 
a nested declaration. In the case the container is a model 
group the reference between its own container and the model 
group must be added to the result as well. 

4. EXPERIMENTATION 

In order to prove the effectiveness of the algorithm we have 
developed a prototype implementation for it. After this we 
have used it with a real case study described in the next 
subsection. 

4.1 Case Study 

The case study is the implementation of the communica- 
tion layer for a client for SOS targeted to the Android plat- 
form. The client must provide support for the Core Profile 
of the SOS specification, which includes the following oper- 
ations [Is] : 

• GetCapabilities: Operation to get metadata informa- 
tion about the service including title, keywords, provider 
information, supported operations, advertised observa- 
tion offerings. 



• DescribeSensor: Operation to get information about a 
given sensor. 

• GetObservation: Operation to get a set of observations 
from a given offering. The observations can be filtered 
by a time instant or interval, location, etc. 

The common flow of interactions between SOS clients and 
servers starts when the client issues a GetCapabilities re- 
quest to the server, which answers by sending back its Ca- 
pabilities flle. After parsing this flle the client knows which 
operations are supported by the server and which informa- 
tion about sensors and observations can be requested by 
issuing DescribeSensor and GetObservation requests. 




Figure 5: Location of air pollution control stations 
in the Valencian Community 

On the server side we use a 52° North SOS Servei]^ con- 
taining information about air quality for the Valencian Com- 
munity gathered by 57 control stations located in that area 
(Figure |5|. The stations measure the level of different con- 
taminants in the atmosphere. 

In order to measure how much the schemas can be re- 
duced with the subsetting algorithm, we compare the size of 
the original or full schema set with the size of the reduced 
or simplified schema set. To measure the size of a schema 
set S = (Ts, Es, As, MGs, AGs, Rs), we calculate the 
cardinality of the flrst five sets conforming the schema set, 
and the cardinalities of every relation included in Rs . 

After this, we use some code generators for XML data 
binding to measure how much the generated binary files are 

®http:/ /52north.org/SensorWeb/sos/ 



reduced when using the simphfied schema set. The steps 
of the process followed during the experiment are shown in 
Figure [6] 




Subsetting 
algorithm 




Input 

i 

XML Data Binding 
Code Generator 



Output 




Figure 6: Flow diagram for the experiment 

As our work is mostly focused on the production of XML 
processing code, we just consider this part of the implemen- 
tation in the following subsections. 

4.2 Gathering Input Instance Files 

In order to generate the schema subset needed for the SOS 
client we must decide which input to pass to the algorithm. 
To obtain this set of instance files we sent requests manually 
to the server and stored server responses. We gathered 2492 
instance files as input: the capabilities file, 2312 responses 
containing sensor descriptions, and 179 corresponding to ob- 
servations. Our application must be capable of processing 
the following root elements: 

• Capabilities: Server response with the service capabil- 
ities file 

• SensorML: Server response containing information 
about a sensor. 

• ObservationCollection: Server response with observa- 
tions data. 

The first element is defined directly in the SOS specifi- 
cation and the other two are imported from the SensorML 



and O&M specifications, respectively. The number of files 
to be used as input will depend on the requirements of the 
particular application being developed. It might depend on 
availability of the instance files or on how different the con- 
tent of these files is. In our case, although a considerably 
large number of input files was used, just a few would suffice 
because XML tags contained in sensor descriptions and ob- 
servations files were basically the same within the two groups 
of files. 

4.3 Generating the Output Subset 

After applying the algorithm with the input described 
above we obtained the results shown in Table [l] and Fig- 
ure [7| where the original schemas set is compared with the 
simplified set. In addition to cardinalities of components and 
relations we use two composite metrics: Totalc for the sum- 
mation of cardinalities of all components and Totaln for the 
summation of cardinalities of relations. Results show that 
the subsetting algorithm allows a substantial reduction of 
the original schema set of about 90% of its size. 



Table 1: Comparing original and simplified schema 
sets 



Metric 


Full 


Simplified 




Schema 


Schema 




Set 


Set 


\Ts\ 


846 


112 


\Es\ 


2020 


183 


\As\ 


400 


22 


\MGs\ 


28 


7 


\AGs\ 


39 


3 


\isTypeOfs\ 


2420 


205 


\references\ 


968 


63 


\containss\ 


739 


81 


isDerivedFrorns 


490 


74 


isInSubstitutionGroups \ 


290 


17 


\Totalc\ 


3333 


327 


\Totalii\ 


4617 


423 



4.4 Generating Binary Code 

We explore next how this reduction is translated into gen- 
erated code, specifically we will use XBinder to generate 
code for the Android platform. XBinder is a XML data- 
binding generator that produces code for several program- 
ming languages (C, C++, Java, C#). It also allows the 
generation of code targeted to different mobile platforms 
such as AndroicQ and CLDC 

We will use as well other 
generators targeted to the Java programming language, but 
not targeted to mobile devices: XMLBeansjand JAXB-RI 
to show that our algorithm could be also useful to other 
kind of systems. 

The main metric used to compare generated code is size 
measured in KiloBytes (KBs). Source code is generated for 
the schemas before and after the simplification algorithm 
is applied. Then, the source is compiled and compressed 
into a JAR file. All of the generators need, apart from the 

^http:/ /www. android. com 

* http : //j ava.sun. com / products / cldc / 

^http:/ /xmlbeans. apache. org 

"https://jaxb.dev.java.net 
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Figure 7: Simple schema metrics for original 
schemas and simplified schemas 



generated code, a set of supporting libraries, which is why 
we compared the size of the generated code with and without 
considering the supporting libraries. 



Table 2: Comparing size of generated code (KBs) 
for original and simplified schema sets 





XBinder 


JAXB 


XMLBcans 


Full 


3626 


754 


2822 


Reduced 


567 


90 


972 


Libs 


190 


1,056 


2,684 


FuU+libs 


3816 


1810 


8879 


Reduced+libs 


684 


1146 


3655 



Table [2] shows in the first two rows the comparison of the 
code size only for generated code (Full, Reduced) showing a 
large reduction of between 79 and 88%. The metric values 
for the code generated using JAXB stand out among the 
rest because they are significantly smaller. The third row 
shows the size of the supporting libraries of each generator 
(Libs). At this point we can see the big difference in size 
between the XBinder libraries for Android and the JAXB 
and XMLBeans libraries. XBinder, when used to generate 
code for mobile devices, uses very light supporting libraries, 
but at the expense of moving most of XML processing code 
to the generated code. On the other hand, JAXB has all 



of the XML processing code on the supporting libraries and 
generates clean and compact code. In the case of XML- 
Beans, judging for the large size of the generated code and 
supporting libraries it is clear that it has not been optimized 
to work with large schema sets. 

Last two rows compare code size including supporting li- 
braries (Full-(-libs, Reduced+libs). In this case, if we cal- 
culate the overall reduction in code size it will be smaller 
than the one presented before, as the size of the libraries 
remains constant. It ranges now from a 37% reduction in 
JAXB to 84% in XBinder. Nevertheless, in all cases the re- 
duction of the generated code size is substantial. And the 
size of the code targeted to mobile devices (684 KB) seems 
like something that can be handled by modern devices. 

It is important to notice on this experiment that for none 
of generators the binary code could be directly produced 
without needing some kind of adjustments. The adjust- 
ments could be related to manually modify the generated 
source code to avoid compilation errors, changing configu- 
ration parameters to avoid name clashes in GML related to 
case sensitivity or components with very similar names, or 
the failure of the generator to follow the intricate dependen- 
cies between schema components. 

5. RELATED WORK 

As mentioned in the introductory section the solutions for 
achieving efficient processing of XML for mobile devices use 
compression techniques to reduce the size of XML-encoded 
[3j [5] [21] . These solutions requires that the server be aware 
of the compressed formats which cause that servers already 
online might not be accessible if they cannot be modified to 
support the aforementioned formats. 

Regarding schemas transformation, the closer referent to 
the algorithm presented in this paper is the GML subsetUng 
tool [To], which allows the extraction of GML schemas sub- 
sets called profiles. This tool presents limitations such as 
it can only be applied to GML schemas, it does not handle 
polymorphic dependencies related to subtyping (Section 2). 

Other products that can be compared with our algorithm 
are generators that make some attempt to simplify the final 
code structure, such as JiBXj^or XML Schema Definition 
Tooj^ (henceforth called XSD.NET). JiBX offers the option 
of restricting the generated code to only those parts of the 
schemas that are referenced from other schema components. 
Unfortunately, JiBX does not support dynamic typing of el- 
ements in instance files, preventing its use to process geospa- 
tial schemas. XSD.NET is provided as part of the develop- 
ment tools of the .NET Framework. XSD.NET is the only 
product known by the authors that performs optimizations 
while still preserving all of the type dependencies, explicit 
and hidden ones. Still, none of these tools allows informa- 
tion to be extracted from instance files to perform generated 
code customizations. 

Regarding the use of instance files to drive the manipu- 
lation of schemas, a lot of work has been done related to 
schema inference, where instance files are used to generated 
adequate schema files that can be used to assess their valid- 
ity (e.g. [1] [2j [s] [9]). This problem is different from the one 
presented here, where schemas already exists, but must be 
refined to adjust to more specific requirements. 

^^http:/ /jibx. sourceforge.net 

^^http:/ /msdn. microsoft. com/ en-us/library/x6clkb0s.aspx 



6. CONCLUSIONS 

In this paper we have presented an algorithm to generate 
efficient code for XML data binding for mobile SOS-based 

applications. The algorithm take advantage of the fact that 
individual implementations use only portions of the stan- 
dards' schemas allowing particular customizations to be ap- 
plied by simplifying large schema sets, such the one associ- 
ated to SOS, in an application-specific manner by using a 
set of XML instance files conforming to these schemas. 

Results of applying the algorithm to a real-world use case 
scenario have shown that the algorithm allows a substantial 
reduction of the original schema set of about the 90% of its 
size. This huge reduction in schema size is translated into 
a reduction of generated binary code of more than 80% of 
its size for a SOS client targeted to the Android platform. 
As the transformation is done at the schema level and no 
assumption about the target platform is made by the algo- 
rithm it still can be used for other kind of SOS applications. 
Nevertheless, the resource constraints associated to mobile 
devices make the algorithm far more useful in this area. 

This algorithm could be also applied to other OWS spec- 
ifications although based on the little experience of authors 
with other specifications besides SOS, we cannot state that 
the reduction could be as largo as that obtained in the use 
case scenario presented in Section 4. 

Further work will integrate the algorithm to a code gen- 
erator targeted to mobile devices. This code generator will 
take advantage of other useful information that can be ex- 
tracted during the simplification process that will allow to 
optimize further the generated code. In addition, a per- 
formance study for the generated code would be valuable, 
including also other aspects such as memory consumption 
or execution speed. 
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